Atty. Docket No.: 042390.P6091 
Express Mail No.: EL466333305US 



UNITED STATES PATENT APPLICATION 



FOR 



METHOD, APPARATUS, AND SYSTEM FOR HIGH SPEED DATA TRANSFER 
USING SOURCE SYNCHRONOUS DATA STROBE 

Inventor(s): 



ABID AHMAD 

KATEN SHAH 
ALANKAR SAXENA 



Prepared by: 

BLAKELY SOKOLOFF TAYLOR & ZAFMAN LLP 
12400 Wilshire Boulevard, Seventh Floor 
Los Angeles, CA 90025-1026 
(714) 557-3800 



I. 

* I 

METHOD, APPARATUS, AND SYSTEM FOR HIGH SPEED DATA TRANSFER 
USING SOURCE SYNCHRONOUS DATA STROBE 

RELATED APPLICATIONS 
5 This application claims the benefit of U.S. Provisional Application No. 

60/175,835, filed January 13, 2000. 
FIELD OF THE INVENTION 

The present invention relates generally to the field of data transfer technology. 
More specifically, the present invention relates to a method, apparatus, and system for 

10 high speed data transfer using source synchronous data strobe. 
BACKGROUND OF THE INVENTION 

Currently, graphics controllers/accelerators such as the Intel 740 supports local 
memory interface from 66.67MHz to lOOMHz. A typical graphics controller such as the 
Intel 740 has its own local memory that can be SDRAM or Dual Data Rate SDRAM. 

15 DDR SDRAM specifies data transfers at 2x the maximum transfer rate. For a 100 MHz 
DDR SDRAM, control would be transferred at Ix speed (e.g., once every 100 MHz 
clock) whereas data would be transferred at 2x speed (twice every 100 MHz clock). As 
DRAM vendors move their silicon to next generation processes (e.g., less than or equal to 
0.25 microseconds), the capability to produce higher frequency SDRAM parts will 

20 increase up to a maximum of 1 50 MHz at the system level. The loading on control 
signals is higher than that on data lines which restricts going beyond 150 MHz. DDR 
takes advantage of the lighter data load and increases the data transfer rate. As a result, 
graphics controllers/accelerators need to be able to accommodate high speed data transfer 
at higher frequencies than 100 MHz. 
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SUMMARY OF THE INVENTION 



According to one aspect of the invention, a method is provided in which a write 
5 strobe signal is generated to latch output data into a memory unit that comprises one or 
more dual data rate synchronous dynamic random access memory (DDR-SDRAM) 
devices. The write strobe signal has an edge transition at approximately the center of a 
data window corresponding to the output data. A first receive clock signal is delayed by 
a first delay period using a delay locked loop (DLL) circuit to generate a first delayed 
10 receive clock signal. The first delayed receive clock signal is used to latch incoming data 
fi*om the memory imit. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



The features and advantages of the present invention will be more fully 
understood by reference to the accompanying drawings, in which: 

Figure 1 shows a block diagram of one embodiment of a system according to the 
teachings of the present invention; 

Figure 2 shows a block diagram of one embodiment of a graphics 
controller/accelerator; 

Figure 3 is a block diagram of one embodiment of a local memory interface unit 
according to the teachings of the present invention; 

Figure 4 shows a differential clocking diagram; 

Figure 5 shows a block diagram of one embodiment of a memory PLL circuit 
according to the teachings of the present invention; 

Figure 6 illustrates a block diagram of a local memory I/O structure showing the 
various interface signals between the local memory interface unit and the local memory; 
and 

Figure 7 shows an example of a timing diagram showing various interface signals 
in Figure 6. 



Docket No.: 042390.P6091 



3 



• » 

DETAILED DESCRIPTION 

In the following detailed description numerous specific details are set forth in 
order to provide a thorough understanding of the present invention. However, it will be 

5 appreciated by one skilled in the art that the present invention may be practiced without 
these specific details. 

The present invention provides a method, apparatus, and a system that allows high 
speed data transfer at higher frequencies than 100 MHz. The high speed data transfer can 
be achieved by using a centered write strobed data latching and delay locked loop (DLL) 

1 0 based strobeless read data latching- It is assumed that DDR SDRAM specifications will 
include the follov^ng improvements: differential clocking; differential input buffers; 
additional strobe input; improved input loading; andSSTL electricals (if required). In one 
embodiment, a write strobe signal is generated to latch output data transmitted from a 
transmitting agent (e.g., a memory interface unit of a graphics accelerator) into a memory 

1 5 unit (e.g., a local memory unit coupled to the graphics accelerator). The memory unit, in 
one embodiment, includes one or more dual data rate synchronous dynamic random 
access memory (DDR-SDRAM) devices. The write strobe signal is aligned with respect 
to the data window corresponding to the output data so that the edge transition of the 
write strobe signal occurs at about the center of the data window. In one embodiment, a 

20 first receive signal is delayed by a first delay period using a delay locked loop (DLL) 
circuit to generate a first delayed receive clock signal. The first delayed receive clock 
signal is used to latch incoming data from the memory unit. In one embodiment, the first 
delayed receive clock signal is used to clock a latching device to latch incoming data 
from the memory unit. The incoming data is latched in response to the edge transition of 
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the first delayed receive clock signal. In one embodiment, the first delayed receive clock 
signal is aligned with respect to the data window corresponding to the incoming data so 
that the edge transition of the first delayed receive clock signal occurs at such a point to 
provide sufficient setup time and hold time for the latching device to timely latch the 

5 incoming data from the memory unit. In one embodiment, the DLL circuit is 

programmable using a register. The first delay period is adjustable using a value stored in 
the register. The teachings of the present invention are applicable to any memory 
interface or memory controller that is used to control the data transfer between a graphics 
accelerator/controller and a corresponding local memory unit. However, the teachings of 

10 the present invention are not limited to the memory interfaces between graphics 

controllers and their corresponding local memory units and can also be applied to any 
other scheme, method, apparatus, or system for high speed data transfer between a host 
device and a memory device. 

Figure 1 shows a block diagram of one embodiment of a system 100 according to 

15 the teachings of the present invention. The system 100 as shown in Figure 1 includes one 
or more processors 1 10, a chipset unit 120, a system memory unit 130, a graphics 
controller/accelerator unit 140, a local memory unit 150, and various I/O devices 160. For 
the purposes of the present specification, the term "processor" or "CPU" refers to any 
machine that is capable of executing a sequence of instructions and shall be taken to 

20 include, but not be limited to, general purpose microprocessors, special purpose 

microprocessors, multi-media controllers and microcontrollers, etc. In one embodiment, 
the processors 1 10 are general-purpose microprocessors that are capable of executing an 
Intel Architecture instruction set. The chipset unit 120 is coupled to the processor 1 10 via 
a host bus 115 and coupled to the memory unit 130 via a memory bus 125. The graphics 
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controller/accelerator 140 is coupled to the chipset unit 120 via an AGP bus 145. In one 
embodiment, the chipset unit 120 may be an Intel chipset. In one embodiment, the 
graphics controller/accelerator 140 may be an Intel graphics accelerator. The teachings of 
the present invention, however, are not limited to Intel products and/or architecture and 

5 are applicable to any other products and/or architecture. In one embodiment, the chipset 
unit 120 includes a memory control unit (not shown) that controls the interface between 
various system components and the system memory unit 1 30. The various I/O units 160, 
in one embodiment, are coupled to the chipset unit 120 via an I/O bus or PCI bus 165. 
Figure 2 shows a block diagram of one embodiment 200 of the graphics 

10 controller/accelerator 140 described in Figure 1 . The graphics controller 140, in one 
embodiment, includes an AGP interface 210, a PCI interface 220, a local memory 
interface 230, a clocks and reset unit 240, a general purpose I/O unit 250, a video 
interface 260, a display interface 270, a digital TV Out unit 280, and a BIOS ROM 290. 
The structure and operation of the local memory interface unit 230 are described in more 

15 detail below. In one embodiment, the local memory interface 230 controls the interface 
(e.g., data transfer) between the graphics accelerator 140 and the local memory unit 150. 

Figure 3 shows various signal interfaces between the local memory interface unit 
230 and the local memory 150. As shown in Figure 3, the local memory interface unit 
230 sends data, control, and clock signals to the local memory unit 150 (e.g., SDRAM). 

20 The local memory interface unit 230 also receives data from the local memory 1 50. In 
one embodiment, the control and clock signals are used by the local memory interface 
unit 230 to facilitate and control the data transfer between the local memory interface unit 
230 and the local memory 150, 
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Figure 4 shows a differential clocking diagram of two clock signals oCLK and 
oCLK# generated by the graphics accelerator 140 to facilitate data transfer between the 
graphics accelerator 140 and the local memory 150. In one embodiment, the graphics 
accelerator 140 generates two copies of the same clock phase shifted by half a clock. 
5 This effectively provides 2x clocking as shown in Figure 4. 

Figure 5 shows a block diagram of one embodiment of a phase locked loop (PLL) 
circuit 500 for generating two pairs of clock signals oCLK and iCLK that are used to 
facilitate data transfer between the graphics accelerator 140 and the local memory 150. In 
one embodiment, the PLL circuit 500 as shown in Figure 5 is contained within the local 

10 memory interface unit 230. In another embodiment, the PLL circuit 500 may be a stand 
alone unit or contained within another imit in the graphics accelerator 140. In one 
embodiment, the oCLK/oCLK# pair is used for external DQ I/O and control clocking and 
iCLK/iCLK# pair is used internally for clocking the write strobe (also referred to as Write 
QS herein). In one embodiment, the iCLK is oCLK plus a delay equal to the period of the 

1 5 FVCO of the PLL. This allows for adding a fixed PLL delay to the iCLK. This fixed 
delay is relatively insensitive to changes in process, temperature and voltage. Figtire 5 
shows the various fixed PLL delays at corresponding firequencies. At 100 MHz, N/M is 
9/2 and 1/P is 1/3 which gives a PLL delay of 3.3 ns. 

Figure 6 shows a block diagram of one embodiment of a local memory I/O 

20 structure to facilitate data transfer between the graphics accelerator 1 40 and the local 
memory 150 (e.g., SDRAM). As shown in Figure 6, the Memory PLL 500 receives the 
MCLK clock input (at either 66.7 MHz or 60.0 MHz), a ratio input of 9/2 and a divide 
input of 3 and generates two pairs of clock signals: oCLK and oCLK#, iCLK and iCLK#. 
As shown in Figure 6, the oCLK signal is used to generate the tCLK (transmit clock) and 
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the rCLK (receive clock). The tCLK is sent to the local memory 150 (e.g., the SDRAM) 
and the rCLK clock is used to latch the incoming data (data reads) from the local memory 
150. The rCLK is input to a DLL circuit 627 that generates a clock signal to latch input 
data coming from the local memory 150. In this embodiment, the DLL 627 is a 

5 programmable DLL which receives the rCLK as its input and generates the output signal 
which is used to clock a latching device 625 to latch data coming from the local memory 
(data reads). The oCLK# signal is used to generate the tCLK# signal which is sent to the 
local memory 150 (e.g., SDRAM). The oCLK and oCLK# signals are used to clock 
latching devices 621 and 623, respectively. The iCLK and iCLK# are used to clock 

10 latching devices 61 1 and 613, respectively. As shown in Figure 6, the iCLK signal is 
used to clock the latching device 61 1 to send the write data strobe signal QS to the local 
memory 150 for data writes. The iCLK signal is also used to clock a latching device 615 
to send control signals to the local memory 150. The present invention thus provides a 
mechanism for high speed data transfer (more than 100 MHz) between the local memory 

15 150 which uses DDR SDRAM and the graphics controller/accelerator 140 by using 

source synchronous data strobe for writes (i.e., the write data strobe or QS signal shown 
in Figure 6) and programmable DLL for reads (i.e., DLL based strobeless read data 
latching). 

Figure 7 shows a diagram illustrating the timing of the various signals described 
20 in Figures 5 and 6 above. As shown in Figure 7, the oCLK and oCLK# signals are two 
copies of the same clock phase shifted by half a clock. The iCLK is oCLK plus a delay 
equal to period of the Fvco of the PLL. The iCLK# is shifted from the iCLK by half a 
clock. The Write QS signal is used to trigger data writes to the local memory 1 50. The 
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rCLK and rCLK# signals are used to trigger data reads from the local memory 150 using 
a DLL delay as shown in Figure 6. 

The invention has been described in conjunction with the preferred embodiment. 
It is evident that numerous alternatives, modifications, variations and uses will be 
5 apparent to those skilled in the art in light of the foregoing description. 
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